Ultra-fine entity typing (UFET) predicts extremely free-formed types (e.g., president, politician) of a given entity mention (e.g., Joe Biden) in context. State-of-the-art (SOTA) methods use the cross-encoder (CE) based architecture. CE concatenates the mention (and its context) with each type and feeds the pairs into a pretrained language model (PLM) to score their relevance. It brings deeper interaction between mention and types to reach better performance but has to perform N (type set size) forward passes to infer types of a single mention. CE is therefore very slow in inference when the type set is large (e.g., N = 10k for UFET). To this end, we propose to perform entity typing in a recall-expand-filter manner. The recall and expand stages prune the large type set and generate K (K is typically less than 256) most relevant type candidates for each mention. At the filter stage, we use a novel model called MCCE to concurrently encode and score these K candidates in only one forward pass to obtain the final type prediction. We investigate different variants of MCCE and extensive experiments show that MCCE under our paradigm reaches SOTA performance on ultra-fine entity typing and is thousands of times faster than the cross-encoder. We also found MCCE is very effective in fine-grained (130 types) and coarse-grained (9 types) entity typing. Our code is available at \url{https://github.com/modelscope/AdaSeq/tree/master/examples/MCCE}.
translated by 谷歌翻译
Modeling noise transition matrix is a kind of promising method for learning with label noise. Based on the estimated noise transition matrix and the noisy posterior probabilities, the clean posterior probabilities, which are jointly called Label Distribution (LD) in this paper, can be calculated as the supervision. To reliably estimate the noise transition matrix, some methods assume that anchor points are available during training. Nonetheless, if anchor points are invalid, the noise transition matrix might be poorly learned, resulting in poor performance. Consequently, other methods treat reliable data points, extracted from training data, as pseudo anchor points. However, from a statistical point of view, the noise transition matrix can be inferred from data with noisy labels under the clean-label-domination assumption. Therefore, we aim to estimate the noise transition matrix without (pseudo) anchor points. There is evidence showing that samples are more likely to be mislabeled as other similar class labels, which means the mislabeling probability is highly correlated with the inter-class correlation. Inspired by this observation, we propose an instance-specific Label Distribution Regularization (LDR), in which the instance-specific LD is estimated as the supervision, to prevent DCNNs from memorizing noisy labels. Specifically, we estimate the noisy posterior under the supervision of noisy labels, and approximate the batch-level noise transition matrix by estimating the inter-class correlation matrix with neither anchor points nor pseudo anchor points. Experimental results on two synthetic noisy datasets and two real-world noisy datasets demonstrate that our LDR outperforms existing methods.
translated by 谷歌翻译
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE. Most existing efforts largely focused on directly extracting potentially useful information from images (such as pixel-level features, identified objects, and associated captions). However, such extraction processes may not be knowledge aware, resulting in information that may not be highly relevant. In this paper, we propose a novel Multi-modal Retrieval based framework (MoRe). MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively. Next, the retrieval results are sent to the textual and visual models respectively for predictions. Finally, a Mixture of Experts (MoE) module combines the predictions from the two models to make the final decision. Our experiments show that both our textual model and visual model can achieve state-of-the-art performance on four multi-modal NER datasets and one multi-modal RE dataset. With MoE, the model performance can be further improved and our analysis demonstrates the benefits of integrating both textual and visual cues for such tasks.
translated by 谷歌翻译
Ultra-fine entity typing (UFET) aims to predict a wide range of type phrases that correctly describe the categories of a given entity mention in a sentence. Most recent works infer each entity type independently, ignoring the correlations between types, e.g., when an entity is inferred as a president, it should also be a politician and a leader. To this end, we use an undirected graphical model called pairwise conditional random field (PCRF) to formulate the UFET problem, in which the type variables are not only unarily influenced by the input but also pairwisely relate to all the other type variables. We use various modern backbones for entity typing to compute unary potentials, and derive pairwise potentials from type phrase representations that both capture prior semantic information and facilitate accelerated inference. We use mean-field variational inference for efficient type inference on very large type sets and unfold it as a neural network module to enable end-to-end training. Experiments on UFET show that the Neural-PCRF consistently outperforms its backbones with little cost and results in a competitive performance against cross-encoder based SOTA while being thousands of times faster. We also find Neural- PCRF effective on a widely used fine-grained entity typing dataset with a smaller type set. We pack Neural-PCRF as a network module that can be plugged onto multi-label type classifiers with ease and release it in https://github.com/modelscope/adaseq/tree/master/examples/NPCRF.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
链接的语音实体旨在识别和消除语言中的命名实体。常规方法严重遭受了不受限制的语音样式和ASR系统产生的嘈杂笔录。在本文中,我们提出了一种名为“知识增强命名实体识别”(KENER)的新颖方法,该方法致力于通过在实体识别阶段无痛地纳入适当的知识来改善鲁棒性,从而改善实体联系的整体性能。肯纳(Kener)首先检索未提及的句子的候选实体,然后利用实体描述作为额外的信息来帮助识别提及。当输入短或嘈杂时,由密集检索模块检索的候选实体特别有用。此外,我们研究了各种数据采样策略和设计有效的损失功能,以提高识别和歧义阶段中检索实体的质量。最后,将与过滤模块的链接作为最终保障措施应用,从而可以过滤出错误认可的提及。我们的系统在NLPCC-2022共享任务2的轨道1中获得第一名,并在轨道1中获得第一名。
translated by 谷歌翻译
尽管变形金刚已成功地从其语言建模起源过渡到基于图像的应用程序,但它们的二次计算复杂性仍然是一个挑战,尤其是对于密集的预测。在本文中,我们提出了一种基于内容的稀疏注意方法,以替代密集的自我注意力,旨在降低计算复杂性,同时保留对远程依赖性建模的能力。具体而言,我们聚集,然后汇总键和值代币,作为减少总代币计数的基于内容的方法。由此产生的聚类序列保留了原始信号的语义多样性,但可以以较低的计算成本进行处理。此外,我们进一步将聚类引导的注意力从单尺度扩展到多尺度,这有利于密集的预测任务。我们标记了提出的变压器体系结构固定,并证明它在各种视觉任务上实现了最新的性能,但计算成本较低,参数较少。例如,我们具有2270万参数的cluster小型模型可在Imagenet上实现83.2 \%TOP-1的精度。源代码和Imagenet模型将公开可用。
translated by 谷歌翻译
成功的基于机器学习的命名实体识别模型可能会因某些特殊领域的文本而失败,例如中文地址和电子商务标题,需要足够的背景知识。对于人类注释者来说,此类文本也很难。实际上,我们可以从具有一些共同实体的相关文本中获得一些潜在的有用信息,以帮助文本理解。然后,人们可以通过引用相关样本来轻松地提出正确的答案。在本文中,我们建议使用相关样品增强NER模型。我们通过大规模内域未标记的数据从稀疏的BM25检索器中绘制相关样品。为了明确模拟人类推理过程,我们执行了通过多数投票校准的无培训实体类型。为了捕获训练阶段的相关特征,我们建议通过基于变压器的多构度跨编码器对相关样品进行建模。上述两个域数据集的经验结果显示了我们方法的功效。
translated by 谷歌翻译
快速扩大的神经网络模型在单个设备上运行越来越具有挑战性。因此,在多个设备上的模型并行性对于确保训练大型模型的效率至关重要。最近的建议在长时间处理时间或性能差。因此,我们提出了Celeritas,这是一个快速的框架,用于优化大型型号的设备放置。Celeritas在标准评估中采用简单但有效的模型并行化策略,并通过一系列调度算法生成位置策略。我们进行实验以在许多大型模型上部署和评估Celeritas。结果表明,与大多数高级方法相比,Celeritas不仅将放置策略生成时间减少26.4 \%,而且还将模型运行时间提高了34.2 \%。
translated by 谷歌翻译
植物点云的分割以获得高精度的形态特征对于植物表型和作物育种至关重要。尽管深度学习方法的绽放促进了对植物点云的分割的大量研究,但大多数作品遵循基于硬素化或基于下采样的方法的共同实践。它们仅限于细分简单的植物器官,忽略了解决具有高空间分辨率的复杂植物点云的困难。在这项研究中,我们提出了一个深度学习网络分割变压器(PST),以实现MLS(移动激光扫描)油料种子强奸点云的语义和实例分割,该强奸点云将其特征在于微小的硅酸盐和致密点作为主要特征。 PST由:(i)一个动态体素特征编码器(DVFE),可通过原始空间分辨率进行每个点特征聚集; (ii)双窗口设置注意力块以捕获上下文信息; (iii)一个密集的特征传播模块,以获得最终的致密点特征图。结果证明,PST和PST-PointGroup(PG)在语义和实例分段任务中实现了最新性能。对于语义细分,PST分别达到93.96%,97.29%,96.52%,96.88%和97.07%的平均值,平均精度,平均召回率,平均F1得分和整体准确性。例如,在MCOV,MWCOV,MPERC90和MREC90中,分割的PST-PG分别达到89.51%,89.85%,88.83%和82.53%。这项研究以端到端的方式扩展了油料强奸的表型,并证明了深度学习方法具有巨大的潜力,可以理解具有复杂形态特征的密集植物点云。
translated by 谷歌翻译